Internet Info 1997 December

home *** CD-ROM | disk | FTP | other *** search

/ Internet Info 1997 December / Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso / ietf / urn / urn-archives / urn-ietf.archive.9610 / 000058_owner-urn-ietf _Mon Oct 21 13:39:55 1996.msg < prev next >

Wrap

Internet Message Format | 1997-02-19 | 8KB

Received: (from daemon@localhost) by services.bunyip.com (8.6.10/8.6.9) id NAA27465 for urn-ietf-out; Mon, 21 Oct 1996 13:39:55 -0400 Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.6.10/8.6.9) with SMTP id NAA27460 for <urn-ietf@services.bunyip.com>; Mon, 21 Oct 1996 13:39:51 -0400 Received: from josef.ifi.unizh.ch by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA21595 (mail destined for urn-ietf@services.bunyip.com); Mon, 21 Oct 96 13:39:01 -0400 Received: from ifi.unizh.ch by josef.ifi.unizh.ch id <01070-0@josef.ifi.unizh.ch>; Mon, 21 Oct 1996 19:38:54 +0100 Subject: Re: [URN] Pre release of URN Syntax document.... To: jayhawk@ds.internic.net Date: Mon, 21 Oct 1996 19:38:53 +0100 (MET) Cc: urn-ietf@bunyip.com, leslie@bunyip.com, michaelm@internic.net, rdaniel@acl.lanl.gov, paf@swip.net In-Reply-To: <9610181749.AA07976@mocha.bunyip.com> from "Ryan Moats" at Oct 18, 96 12:52:34 pm Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 6658 From: Martin J Duerst <mduerst@ifi.unizh.ch> Message-Id: <"josef.ifi..310:21.09.96.18.38.55"@ifi.unizh.ch> Sender: owner-urn-ietf@services.bunyip.com Precedence: bulk Reply-To: Martin J Duerst <mduerst@ifi.unizh.ch> Errors-To: owner-urn-ietf@bunyip.com Ryan Moats wrote: >Well folks, we've reached the point where this is ready for the list >to scream about (semi :-) ). Well, first a great scream of joy and congratulations that you have made it with iso10646/UTF-8. More comments on that below. >Internet-Draft Ryan Moats >draft-ietf-urn-syntax-00.txt AT&T >Expires in six months October 1996 >Abstract > > This document presents the syntax for Uniform Resource Names (URNs). > More information on the purpose of URNs is available from [1]. > >1. Syntax > > All URNs have the following syntax: > > > <URN> ::= ["urn:"] <NID> ":" <NSS> > > <NID> is the Namespace IDentifier, and <NSS> is the Namespace > Specific String. The Namespace ID is used to determine the > _syntactic_ interpretation of the Namespace Specific String to the > extent of extracting the Naming Authority information (as discussed > in [1]). >1.1 Namespace Identifier Syntax > > The following is the syntax for the Namespace Identifier. To (a) be > consistent with all potential resolution schemes and (b) not put any > undue constraints on any potential resolution scheme, the syntax for > the Namespace Identifier is: > > <NID> ::= <letter> [ <let-hyp> ] > > <let-hyp> ::= <letter> | "-" > > <letter> ::= any one of the 52 alphabetic characters A through Z > in upper case and a through z in lower case > > This is slightly more restrictive that what is stated in RFC 1738 [4] > (which allows the period "."). Further, the Namespace Identifier is > case insensitive, so that "ISBN" and "isbn" refer to the same > namespace. Kind of disappointed that i18n didn't make it for NIS. But it's probably not necessary. And if it becomes necessary, we could just create a new NIS, "i" for short, and could prefix it. Or am I stretching my understanding of URN syntax too much? >1.2 Namespace Specific String Syntax > > Depending on the rules governing a namespace, valid identifiers in a > namespace might contain characters that are reserved characters in > URI syntax or non-printable ASCII characters. To accommodate the > largest set of valid identifiers, the NSS portion of a URN shall use > UTF-8 representation of ISO 10646 as its character set. This is really what I have waited for for a long time. As I wrote in a different mail, I have discussed that with many people on many occasions. It is definitely the right direction to go. But it needs a few refinements to assure it will work nicely, and to assure it won't be getting unnecessary opposition. Hopefully, these things can be integrated in this document; maybe another document is needed. One main problem is that while the URL/URI didn't specify any character semantics, this proposal defines everything in terms of characters. For URL/URIs, it was not defined what character, or part of a character, bytes above 0x7F were denoting, nor was one even sure, after reading the specs, that an "A" in an URL was supposed to be the character "A". One only got a certain confidence in such coincidences after having a look at some actual examples. This had some advantages, as it left open any decisions for the individual protocols, and it also considered cases where the information in the URL did not have anything to do with characters (e.g. something like the data URL). So I think the spec should go more in the direction of: - If an NSS contains characters, these should be encoded using UTF-8. - If an NSS contains something else (e.g. pure data), this can be encoded directly using %HH. - If an NSS, for some reasons, decides to use a different encoding for characters, this is not prohibited (but not suggested). When the second point is introduced, the third cannot be avoided anyway, because a NSS scheme may just define that it does %HH encoding on its own, using a different character syntax, and the NSS encoding of the characters in that scheme would only trivially encode ASCII characters (with lots of "%" in it). This will also mean that NSS are not limited to be correct UTF-8 sequences. The other big problem is equivalence. For Unicode, character equivalence is in some cases not the same as codepoint equivalence. Charecter equivalence is well defined, but there is currently no standard for normalization. This does not concern things such as case, where the user can easily distinguish lower case and upper case, but cases such as A-with-Grave, which can be encoded both as one single codepoint and as the sequence A, Grave. Another problem that should be considered are bidirectionality for Hebrew and Arabic. > URN resolvers MUST be capable of accepting URNs that have been > %encoded for either 8-bit clean or 7-bit transports. %encoding is > removed first, then UTF-8 decoding is performed. URN resolvers MUST > return identical results from ANY legally encoded form of the URN. It would be a good idea if URN resolvers could alos care about UNicode character equivalence. But maybe some schemes would do that easier than others? > It should be noted that certain characters in the Namespace Specific > String syntax may have special meaning in certain namespaces. > Therefore, each namespace shall indicate which characters have > special meaning and how (if possible) to encode those characters if > used in a literal sense. Rather a great requirement on implementors, because they have to know new things for every new namespace. But maybe it cannot be avoided. The question here is: How to escape a character beyond ASCII? Would it be possible to specify that all characters in ASCII, if in %HH form, are to be considered escaped? >2. Grand-fathering > > To allow for grand-fathering of existing naming systems (as required > by [2]), the Namespace Specific String shall be considered an "opaque > string" in the sense of structure except as mentioned in Section 1. Just the fact that it is considered a string of characters in all cases is somewhat restricting. >3. Lexical Equivalence > Lexical equivalence at the server is dependent on the namespace, and > thus each namespace shall indicate how lexical equivalence may be > determined by servers for URNs specifying resources in that > namespace. As said, some general help or suggestions for Unicode character equivalence might be in order here. This does not restrict individual namespaces to go further in declaring equivalences. That's it, for the moment. Looking forward to interesting discussions. Regards, Martin.